Word2VisualVec: Cross-Media Retrieval by Visual Feature Prediction

نویسندگان

  • Jianfeng Dong
  • Xirong Li
  • Cees Snoek
چکیده

This paper attacks the challenging problem of cross-media retrieval. That is, given an image find the text best describing its content, or the other way around. Different from existing works, which either rely on a joint space, or a text space, we propose to perform cross-media retrieval in a visual space only. We contribute Word2VisualVec, a deep neural network architecture that learns to predict a deep visual encoding of textual input. We discuss its architecture for prediction of CaffeNet and GoogleNet features, as well as its loss functions for learning from text/image pairs in large-scale click-through logs and image sentences. Experiments on the Clickture-Lite and Flickr8K corpora demonstrate the robustness for both Text-to-Image and Image-toText retrieval, outperforming the state-of-the-art on both accounts. Interestingly, an embedding in predicted visual feature space is also highly effective when searching in text only.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Predicting Visual Features from Text for Image and Video Caption Retrieval

This paper strives to find amidst a set of sentences the one best describing the content of a given image or video. Different from existing works, which rely on a joint subspace for their image and video caption retrieval, we propose to do so in a visual space exclusively. Apart from this conceptual novelty, we contribute Word2VisualVec, a deep neural network architecture that learns to predict...

متن کامل

A Novel Method for Content Base Image Retrieval Using Combination of Local and Global Features

Content-based image retrieval (CBIR) has been an active research topic in the last decade. In this paper we proposed an image retrieval method using global and local features. Firstly, for local features extraction, SURF algorithm produces a set of interest points for each image and a set of 64-dimensional descriptors for each interest points and then to use Bag of Visual Words model, a cluster...

متن کامل

A Novel Method for Content Base Image Retrieval Using Combination of Local and Global Features

Content-based image retrieval (CBIR) has been an active research topic in the last decade. In this paper we proposed an image retrieval method using global and local features. Firstly, for local features extraction, SURF algorithm produces a set of interest points for each image and a set of 64-dimensional descriptors for each interest points and then to use Bag of Visual Words model, a cluster...

متن کامل

Semantic-Based Cross-Media Image Retrieval

In this paper, we propose a novel method for cross-media semantic-based information retrieval, which combines classical textbased and content-based image retrieval techniques. This semantic-based approach aims at determining the strong relationships between keywords (in the caption) and types of visual features associated with its typical images. These relationships are then used to retrieve im...

متن کامل

Learning a Semantic Space by Deep Network for Cross-media Retrieval

With the growth of multimedia data, the problem of cross-media (or cross-modal) retrieval has attracted considerable interest in the cross-media retrieval community. One of the solutions is to learn a common representation for multimedia data. In this paper, we propose a simple but effective deep learning method to address the cross-media retrieval problem between images and text documents for ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/1604.06838  شماره 

صفحات  -

تاریخ انتشار 2016